UK Patent Application ™,GB ,,,,2312125 ,,3, A 



(21) Application No 9705970.3 

(22) Date of Filing 21.03.1997 



(32) 11.04.1996 (33) GB 



(71) Applicants) 

Dtscrert Lofllc Inc 

(Incorporated in Canada - Quebec) 

5505 St-Uurent Blvd. Suite 5200, Montreal, 
Quebec H2T 1S6, Canada 



(43) Date of A Publication 15.10.1997 

(51) INTCL* 

H04N 5/222 , G06T 1 7/00 , H04N 5/272 

(52) UK CL (Edition O ) 

H4F FD12X FD2B FD27T2 FD30K FD31G FD31X FGJ 

(56) Documents Cited 

GB 2305050 A GB 2271241 A US 4970666 A 
SMPTE Journal Vol 103, No 6, June 1994,pa9e» 386 to 
390 

(5S) Field of Search 

UK CL (Edition O ) H4D DLAB DLFB DLVX H4F FGJ 
FGM 

INT CL« G06T 15/10 17/00 , H04N 5/14 5/222 5/265 
5/272 9/74 



(74) Agent and/or Address for Service 
Atkinson & Co 

The Technology Park, 60 Shirland Lane, 
Lower Don Valley. SHEFRELD, S9 3SP, 
United Kingdom 



(54) Virtual studio with zoom control 

o/L^TJ^T^^ "^TV^^ generated by a camera in addition to positional data representing characteristics 
nImK !^ • ■"^''^'^'"9 indication of zoom control. A synthesized image 202-204 is generated for 
combination w,th the real image and the perceived focus of the synthesized image is adjusted in response to 

=r ra'id sTzS ^ ^^^^ - ^'^-^ ^^^^ ^ - -™ 




Figure 2 



o 

CD 

& 

CJl 



print incorporates corrections made under Section 117(1) of the Patents Act 1977. 



DLBCV^M-UICfageOI 



1/14 




DLBC\P^-i-UtOPage02 



2/14 




DLBC\P"M-UK\Page03 




DLBCNP-M-UKV'ageCM 



4/14 




STUDIO 
MIXER 



MONITOR 



Figure 4 



DLBC\r -l-UKNPageOS 



5/14 




6/14 



CONTROL 



VIDEO 
TEXTURE 



CAMERA 
POSITION 





VIDEO RATE SET " 
CONSTRUCTION 


602-^ 







SYNTHESIZED 
OUTPUT 



Figure 6 



DLBCV=-1-UK\Paoe07 



7/14 




DLBCNP-M-UKV'ageOa 



8/14 




DLBC\P''-'1-UK\Paoe09 



YES 



<_ 



9/14 



SELECT NEXT OBJECT 



RECURSIVELY 
CONCATENATE OBJECT 
TRANSFORMATIONS 



CONCATENATE VIEWING 
MATRIX 



CONCATENATE 
PROJECTION MATRIX 



OBJECT CULLING 



ADD OBJECT TO 
DISPLAY LIST 



ANOTHER OBJECT ? 



NO 



3- 



901 



^902 



^903 



^904 



905 



907 



908 



SUPPLY DISPLAY LIST TO 
RENDERING PROCESSOR 



-909 



Figure 9 



DLBCNP"-"! -UK\Page1 0 



10/14 



N = 0 



3- 



1001 



CLEAR RGB AND Z BUFFERS 



NO 



CALCULATE VARIED 
PROJECTION MATRIX 



DRAW SCENE 



ACCUMULATE VARIED 
PROJECTION OF SCENE 
INTO BUFFERS 



INCREMENT N 



N = QUALITY ? 



Figure 10 



3^ 



1002 



-1003 



3- 



1004 



M005 



3- 



1006 
1007 





YES 






DIVIDE ACCUMULATED 
BUFFER CONTENTS BY N 


/^1008 










DEFOCUS COMPLETED 


^1^^1009 



DLBC\P" l-UK>Pagell 



11/14 



0) 
13 



o ^ 



o ^ 

t3 



i I 
I 1 



V 



y 



DLBC\P^-" 1 -UK\Page12 



12/14 





OUTPUTS 


1 

N 
P 
U 
T 

S 




w ' 


w ' 


2 ' 


w ' 


X 

w 


a 


e 


i 


m 




JL. 

w 


b 


f 


j 


n 




2 
W 


c 


g 


k 


0 




w 


d 


h 


1 


P 





Figure 12a 



X ' 


- 5Lif 






+ 


C 2 


w ' 


w 








tu 


JL' 


ex 


+ 


If 


+ 




w ' 






u; 






2 ' 


i X 


+ 




+ 


k2 


lU ' 


w 




ly 






LU ' 


m X 


+ 


nj; 


+ 


O 2 








w 




LU 



6w 
hw 



Figure 12b 



w 

LU ' 



Figure 12c 



13/14 



tan (a ). aspect ratio 
1 



tan (a) ^ 

(far + near \ ^ 2 z / far . near \ 
far - near / ^ \ far - near / 




DLBC\P'"-<l-UK\Pagel4 



14/14 




File: DLBC\P211-UK 



2312125 



PROCESSING IMAGE DATA 

5 The present invention relates to processing image data in which real 

image data generated by camera is combined with synthesized image data. 

Introduction 

Techniques for generating realistic three-dimensional synthetic images 

10 are becoming established in increasingly diverse applications due to the 
steady decrease in cost of high performance processing components, and 
the continuing advance in the art of graphic manipulation procedures. As the 
realism of synthetic images Improves, a dear goal has been identified, which 
is to produce synthetic images which are indistinguishable from real images. 

15 While this goal may be attainable when a single image is to be generated, 
the rapid generation of picture frames which represent complicated nrwving 
and interacting objects in real-time requires considerable computational 
resources. This goal is made even more difficult when real images are 
combined in real time with synthetic images, as the human eye is sensitive to 

20 subtle differences between image qualities. 

An emerging application of real-time three-dimensional graphics is the 
virtual studio. In a virtual studio, images of a real set. usually including a 
television presenter, are combined images generated from a virtual set. 
Most of the real studio consists of a blue background, which is then replaced 

25 electronically with the virtual set. Parts of the real image vdiich are not 
coloured blue are superimposed on the virtual set, in such a way that the final 
combined image appears realistic. A studio of the type is disclosed in United 
States patent number 5479597 or Armand Fellov>rs. 

An advantage of the virtual studio is that only a small real studio space 

30 is required, upon which an image of a much larger virtual studio area may be 
imposed, including various three-dimensbnal stage props and logos specific 
to a television programme. Once a recording for a particular programme has 
been completed, the entire virtual set may be replaced instantly, so the studio 



2 



is ready for use in a completely different television program. In a traditional 
studio, different hardware, in the form of stage props and so on, may be 
needed for each different program. In the course of a week, many dozens of 
different television programs with different stage props may be required, 
5 which would either have to be stored carefully, or alternatively constructed 
from scratch. 

A major constraint in operating a virtual studio is the need to maintain 
precise alignment between the characteristics and position of the real camera 
and those of a virtual camera which is modelled in the image synthesising 
10 computer. The human eye is easily able to detect subtle differences between 
real and virtual images, such as mismatch in focus, which will then result in a 
less convincing perception of the combined studio image. 

Summary of the Invention 

1 5 According to a first aspect of the present invention, there is provided a 

method of processing image data, wherein real image data generated by a 
camera is combined with synthesized image data, comprising steps of 
generating camera positional data representing characteristics of said 
camera, including an indication of zoom control; generating a synthesized 

20 image in response to said positional data; and adjusting the perceived focus 
of said synthesized image in response to zoom control adjustments, so as to 
effect a focusing difference between a portion of said real image and a 
portion of said synthesized image. 

Preferably, a portion of said synthesized image is defocused to 

25 emphasize its location behind said portion of the real image. 

Brief Description of ttie Drawings 

Figure 1 shows a real set in a virtual studio, including a television 
monitor; 

30 Figure 2 shows the combined image shown on the monitor shown in 

Figure 1; 



3 



Figure 3 details control equipment used to generate the combined 
image shown in Figure 2, including a graphics processor. 

Figure 4 details connections between the graphics processor shown in 
Figure 3 and other equipment used in a virtual studio; 
5 Figure 5 details the graphics processor shown in Figure 3 and Figure 

4, including a rendering processor and shared memory; 

Figure 6 details processes for combining live camera signals vwth 
virtual set images which are performed by the rendering processor shown in 
Figure 5; 

10 Figure 7 details data structures stored in the shared memory shown in 

Figure 5, including a scene tree, executable scripts and object animation 
functions; 

Figure 8 details processes and relationships for nrxxJifying the scene 
tree shown in Figure 7. including a process of constructing a display list; 
1 5 Figure 9 details the process of constructing a display list shown in 

Figure 8; 

Figure 10 details an arrangement for de-focusing images generated 
substantially in accordance with the operations shown in Figure 8, including 
calculating a varied projection matrix; 
20 Figure 1 1 details the projection matrix used in Figure 10; 

Figure 12 details the structure of a matrix of the type shown in Figure 
1 1 and which is used for three dimensional graphical manipulations; 

Figure 13 details an algebraic expansion of the projection matrix 
shown in Figure 1 1 ; 

25 Figure 14 details the edge of a virtual object vWiich has been de- 

focused in accordance with the processes shown in Figure 10; and 

Figure 15 shows a plan view of the object de-focusing process. 

Detailed Description of the Preferred Embodiment 

30 The invention will now be desaibed by way of example only, with 

reference to the accompanying figures identified above. 



4 



A virtual studio is shown in Figure 1, which includes a presenter 101 
against a blue background 102. A television camera 103, fitted with a zoom 
lens 104, is rotatably mounted on a fixed tripod 108. The camera 103 
generates a video signal which is supplied to processing equipment along a 
5 video cable 105. Sensors mounted on the camera 103 and between the 
camera 103 and the tripod 108. generate signals which define the pan, 
rotation and tilt of the camera 103, and the zoom and focus of the zoom lens 
104. These signals are combined in interface and processing circuitry 
mounted with the camera, and are supplied over an RS432 serial data cable 

10 106. to processing equipment. The presenter 101 is able to view the resulting 
combined real and virtual images on a video monitor 107. mounted at the 
side of the studio set. In some circumstances, it will be necessary for the 
presenter to be aware of the location of virtual objects not physically located 
within the real set, in order to maintain a convincing illusbn of their presence. 

15 Thus, the presenter may point to a virtual object wtiich does not physically 
exist, by co-ordinating their movements with the resulting image shown on 
the video monitor 107. 

The image displayed on the video monitor 107, shown in Figure 1, is 
detailed in Figure 2. The presenter 101 is the only part of the displayed 

20 image included in tfie combined image. All the other areas 102 of the real 
studio within the field of view of the camera 1 03 are coloured blue, and are 
thus replaced by a synthesized virtual set. The components of the virtual set 
include a pedestal, 202, upon which is a statue 203. In the background there 
is a two dimensional backdrop 204 consisting of moving images from a film. 

25 Thus the virtual set includes both three-dimensional and two 

dimensional objects, which are viewed by a virtual camera. The virtual 
location of the virtual camera is arranged to follow the real location of the real 
camera, so that a change in view of the presenter 101 will result in an 
appropriate shift in view of the objects in the virtual set. For example, the real 

30 camera 103 may pan to the left and zoom in slightly, so that the centre of the 
field of view shifts ftnm the presenter 101 to the statue 203. Because all the 



5 



virtual objects are acx;urately modelled in three dimensions, the parallax 
between the statue 203 and the background shifts accordingly. Furthermore, 
the two dimensional film clip shown on the virtual backdrop 204 is projected 
differently, so as to maintain coherence between real and virtual images. 
5 Control over the virtual studio environment, including the selection of 

virtual objects to be included in the overall image produced, is performed 
using the equipment shown in Figure 3. A high quality graphics temninal 301 , 
such as that manufactured by Silicon Graphics Inc, displays the combined 
real and virUjal images produced by the virtual studio. A graphics processor 

10 302 provides the processing capability for generating the virtual set The 
graphics processor 302 also receives video signals from the real carDera 103 
and combines these with the synthesised image of the virtual set. The 
graphics processor 302 is an SGI Onyx Reality Engine Two, manufactured 
by Silkxjn Graphics Incorporated. An editing terminal 303 is used to control 

15 the set-up of the virtual studio using a text editor. The editing terminal 303 is 
connected to an SGI Indigo workstation 304, which provkJes storage and 
editing facilities. The workstation 304 communicates with the graphks 
processor 302 via an etiiemet connection. Thus, an operator may control the 
graphics environment which is synthesized by the graphics workstation 302 

20 and displayed on the high quality graphics monitor 301 . using the terminal 
303 which is connected to the workstation 304. 

Typrcal operations carried out by operators using the equipment 
shown in Figure 3 relate to the particular requirements of operating a virtual 
studk). Firstiy. it is essential that the kjcations of the real and virtual cameras 

25 should be matched. Thus, having positioned the camera 103 on its tiipod 
108, and perhaps selecting a suitable type of lens 104 for the program which 
is to be broadcast or recorded, it is necessary to detennine the exact physical 
tocation of the camera. This is done in two stages. Firstly the optical centre of 
the lens is located. When mounting a lens on a camera, although the lens is 

30 mounted firmly, its precise location cannot be predicted with absolute 
accuracy. Thus, when zooming in and out, the part of the vkleo image v^ich 



6 



remains stationary is typically slightly out of alignment with the centre of the 
image as it is nrieasured electronically. 

For example, in a video camera which uses charge coupled devices 
(CCD) as its image sensors, the image comprises a matrix of pixels, with 
5 each pixel comprising three sub-pixels defining the red, green and blue 
components, as produced by three separate CCD sensors. The image has a 
precise number of pixels in the horizontal and vertical dimensions. Typically 
this number may be in the region of six hundred vertical pixels by eight 
hundred horizontal pixels. The electronic centre of the image is located at the 
10 pixel co-ordinates (400,300). 

Having nnounted a lens, the camera operator zooms in and out in 
order to detemnine which part of the image remains stationary. It is this 
location which is then considered to be the optical centre of the camera and 
lens combination. Having calibrated the optical centre, the camera operator 
15 need not measure the physical location of the camera; this would not be a 
useful measurement, since the measurements that are required must be 
made with respect to the precise location of an image focused onto the CCD 
plane, which may be located at an unknown, or at least not sufficiently 
precisely known, locatk>n within the casing of the camera 103. 
20 In order to accurately calibrate the physical location of the camera, or 

more con^ctiy. to match the location of the focused inriage in the real camera 
with those produced by the virtual camera, sightings of several known points 
in the real studio set are made. Thus, in order to define the location of the 
camera in three dimensions, sightings of three points in the studio are made 
25 by matching the optical centre, now mari<ed by a cross on a monitor, with 
markers in the studio. The locations of these points in three dimensions are 
precisely known, and are fixed. Better accuracy may be achieved by sighting 
four or more known points, with inconsistency between the combined results 
being averaged to provide a reading of improved accuracy. For example, if 
30 five points are sighted, these five are subdrvided into all possible 
pennutations of groups of three. The position of the camera is calculated for 



7 



each permutation, and then the average of the results is used to define the 
camera position. Thus a sequence of calibrations is performed by the camera 
op>erator making various sightings, and a terminal op>erator, using the terminal 
303. supplies appropriate control instructions to the system such that data 
5 received from the camera's rotation, pan. tilt, ftxxis and zoom sensors, is 
combined in the appropriate way during these calibration procedures. 

The camera 103 shown in Figure 1 supplies two types of electrical 
signals. The first type of signal is video, an electrical representation of the 
image focused onto the CCD sensors In the camera. The second type of 

10 electrical signal defines the position of the camera and its lens settings. A 
typical zoom lens 104 mounted on a television camera includes rings for 
zoom, focus and aperture. Of these, the zoom and focus are required in 
order to define realistic real-time behaviour of the virtual canr>era. Thus, rotary 
sensors are mounted on the camera lens. These rotary sensors contain twin 

15 optical emitters and detectors, separated by a serrated disc. The disc is 
mechanically coupled to the movement of a lens ring, such that the passage 
of light between one emitter-sensor pair occurs in precedence to the passage 
of light between the other emitter sensor pair. Thus, the direction of rotation 
of the serrated disk may be detected by the precedence of an electrical 

20 signal from either of the optical sensors. Furthermore, rotation of the serrated 
disk results in repeated blocking and unblocking of the light reaching each 
sensor, and this may t>e used to detemnine a change in position. This 
technique is known as optical quadrature detection, and generates electrical 
pulses whk:h are particularly suitable for interfocing to digital electronic 

25 circuitry. 

Each of the zoom and focus rings has a rotary sensor, which supplies 
electrical signals which may be interpreted as provkling a relative indication 
of tiie respective ring position. By calibrating the absolute position of the lens 
rings vwth reference to a known visual target the relative incrementing and 
30 decrementing electrical signals from the rotary sensors can be used to derive 
an absolute position of the zoom and focus rings, in conjunction with 



appropriate calibration instructions issued from the terminal 303 shown in 
Figure 3. 

Additional rotary sensors are provided on the camera and its 
associated camera head mount, which is a multi-dimensional fixture providing 
5 freedom of movement of the entire camera in dimensions of pan - rotate 
about a vertical axis, or vertical panoramic, arKi tilt - rotate about a horizontal 
axis, or horizontal panoramic. The absolute values of these sensors are 
determined during the sighting calibration procedure described atxDve. 

Connections t>etween the canriera 103 and other studio equipment are 
10 summarised in Figure 4. The camera assembly, indicated schematically as 
401 . generates a video output 402 and positional output 403. The positional 
outputs are supplied to an interface 404 which in tum supplies positional data 
to an image synthesizing process 405. The image synthesizing process 405 
generates a synthesized video image which responds to movements and 
15 adjustments of camera assembly 401 in a way similar to that in which a 
conventional video signal would respond to such adjustments. 

The conventional video signal generated by the camera assembly 401 
is supplied to a video rate chroma keying system 406 arranged to produce a 
key or matte signal that responds to the saturated blue background. The 
20 vkJeo signal is also supplied as a video input to a video keyer 407, 
whereupon the output from the image synthesize process 405 and the output 
from the video camera on vkJeo output 402 are combined or keyed in 
response to the keying signal generated by the chroma keying system 406. 
The composite output is viewable on a monitor 408, similar to monitor 
25 107 and, in addition, this output is also supplied to a studio mixer 409. The 
studio mixer 409 receives other video outputs on lines 410 and a selection 
from these video inputs is made to supply an output signal to air on line 41 1 . 
This output signal is also viewable on a further video nrxDnitor 412. 

The graphics processor 302 shown in Figure 4 is detailed in Figure 5. 
30 Four main processors, CPU1 501 , CPU2, 502, CPU3 503 and CPU4 504 
perform the various calculations and data manipulation procedures 



9 



necessary to create and mix the virtual set with images from the real camera. 
Each processor has high speed local merrxjry 505, 506. 507 and 508. CPU4 
504 is connected directly to a rendering processor 509, which is specifically 
designed to perfonn pixel rendering at high speed. 
5 All four main processors 501, 502, 503 and 504 are connected via a 

common parallel interface. The image synthesizing application is spirt into 
logical processing tasks, with initial conditions and end conditions for each 
task may be made available to all processors, but with computations 
performed within each task done independently. This makes it possible for 

10 each task to be performed at high speed, as there is no need to 
communicate with other tasks on other processors until an allocated task is 
complete. Furthemxire, local high sfjeed menwry 505, 506, 507 or 508 may 
be used to store data and instructions for each task, reducing the need to 
communicate over a global communicatbns bus 51 1 . 

1 5 When communicating over bus 511 , it is necessary to ensure that only 

one processor attempts to control the bus 51 1 at any one time, requiring tinne 
bus arbitration protocols. Furthermore, if there are four processors, the 
maximum data bandwidth of the bus is theoretically divided by four. In 
practrce the reduction in bandwidth is greater than this, due to the arbitratkjn 

20 protocols. 

A further speed restriction is inherent in txjs designs which connect 
several processors. The speed at which signals may be comnujnicated over 
a electrical connections is to some extent dk:tated by the distance over which 
the signals must travel. If processors are distributed over several circuit 

25 boards, the speed of the bus 51 1 is restricted, especially compared with the 
speed of data transfers between digital components communicating on a 
single or closely adjacent circuit btoard. Thus, wherever possible, pnx«sses 
are split into specific tasks, which may take advantage of the particular 
processor architecture which is in use. For certain types of task, data may be 

30 shared between processors. Shared memory 512 is provided for this. 
Communk::ations with external devices over ethemet, RS432, and high 



10 



resolution monitors, computer keyboards and so on. Is provided by input 
output interface 513. 

The image synthesis process 405 identified in Figure 4 Is detailed in 
Figure 6. The camera positional data is supplied to a set construction 
5 process 601 , arranged to produce image frames at video rate. Thus, it should 
be appreciated that the generation of image frames is performed in real time 
such that each frame of a video sequence is individually generated, so as to 
ensure that movements and transitions occur smoothly and are perceived as 
t>eing as real as real objects added to the virtual scene. 

1 0 Camera positional data Is supplied over a line 602 and external control 

is received via a control process 603. 

The set construction process 601 is capable of rerwiering surfaces and 
objects from polygemal primitives. In addition, image planes of full-motion 
video may be included within the set in response to receiving one or more 

1 5 video textures from a video texturing process 604. 

Procedures for set construction, shown In Figure 6, defined by data 
stored in the shared memory 512 shown, in Figure 5. The virtual set is 
defined by a data structure known as a scene tree. A representation of the 
scene tree and other key data structures stored in shared memory is shov^m 

20 in Figure 7. The scene tree 701 comprises a numtier of objects, which are 
defined recursively. Thus object 702 represents the stage backdrop 204 
shown in Figure 2, and an object defined within the backdrop is a link object 
703 to a film clip which is supplied from sonne external real time video source. 
Other simple objects are defined non-recursively, such as the pedestal 

25 202, shown in Figure 2, represented by the non-recursive object 704. 
Complex objects, such as the statue 203 which is also shown in Figure 2, are 
defined by many layers of recursive objects within an overall object 705 
defining the statue. As the scene tree is analyzed, the further down the level 
of recursion one goes, the simpler the object. Thus, at the lowest level of 

30 recursion, objects are defined as primitives. In other words a shape, such as 
a polygon, whose basic structure is understood by the rendering processor 



11 



509, and need not be further defined. 

Repeated references to a single instance of a primitive object such as 
a polygon enable complex three-dimensional structures to be constructed 
from simpler ones, to whatever level of detail is required. Also included in the 
5 shared memory are executable scripts 711, which are executed at the 
t^eginning of each frame and perform manipulations on data structures 
defined within the scene tree 701. Object animation functions 712 enable 
objects within the scene tree to be manipulated in the form of an animation, 
for example the rotation of a propeller on a virtual aeroplane object as it flies 

1 0 across a virtual set. 

Manipulation of the scene tree 701 is summarised in Figure 8. The 
scene tree is a file which may be viewed and manipulated, though not in real 
time, by a text editor 801. The text editor 801 is also able to perform 
manipulations of the executable scripts 711. These are written in the C 

15 programming language, and are compiled so that they may be automatically 
executed at the beginning of each virtual set frame construction process. 

A control interface supplies control data to the scene tree 701 and to 
the animation functions 712. The purpose of this is to enable real tin^e 
control, or possibly synchronization over various aspects of the virtual set. 

20 For example, it may be desired that a virtual aeroplane should fly through the 
virtual set. not at a predetermined time, but rather in response to a cue from 
the program producer. The camera interface 803 controls the way in which 
the scene tree 701 Is nr^nipulated, in that data from the calibrated real 
camera is used to define the perspective projection of the real worid onto a 

25 two dimensional plane. 

Three-dimensional modelling Is a time consuming task. For example, 
the statue 203 shown in Figure 2 is a highly complex shape, and may even 
have been determined by three dimensional white laser scanning of a real 
object. Thus three dimensional models may be incorporated into the scene 

30 tree, via a three dimensional nrwdel import process 804. This provides access 
to a rich library of three dimensional shapes from a wide variety of sources. 



12 



Thus, before the scene tree 701 is interpreted as a description of a particular 
instance in time of the virtual set, various data and or electrical signals may 
be used to determine conditional aspects of its structure. Once these external 
influences have been taken into account, the scene tree Is optimised In an 
5 optimisation process 805. The optimisation process 805 attempts to ensure 
that the structure of the scene tree that is supplied to the rendering process is 
as efficient as possible. After optimisation, the scene tree is converted into a 
display list in process 806. 

The display list generating process 806 breaks down the scene tree 

10 Into vertices of object primitives which may then be supplied to the rendering 
processor 509. The rendering processor can then connect vertices with lines, 
fill polygons or other primitives with surfaces and textures, and perform other 
tasks related to three-dimensional graphics rendering of object primitives. 

The process 806 of generating a display list is detailed In Figure 9. In 

15 process 901, the next object Is selected. In process 902. object 
transfomiations are concatenated. Each object, whether it is a primitive or 
not, may be manipulated in a number of ways in order to perform animation 
or related function. These manipulations are combinations of movement or 
translation, stretching or rotation. These basic transformations are known as 

20 affine transformations. Each such manlpulatron is performed arithmetically by 
evaluating a transformation matrix multiplied by tiie points which define the 
vertices of an object. Given a set of points in three-dimensional virtual space, 
generally referred to as vertices In worid space, each vertex may be 
muttiplied sequentially by any number of ticinsfomnation matrices, thus 

25 enabling complex manipulations to be performed, without having to calculate 
a unique equation for any one of an infinite variety of possible geometiic 
transformations. 

Furthermore, by sequentially multiplying by several transfomnation 
matrices. In the fomn of a list of transfomiations, it tiecomes possible to 
30 remove transformation matrices from the list, and so undo effects which tum 
out to be undesirable. This is the general approach adopted in most two 



13 



dimensional and three dimensional graphics systems. The process of 
multiplying by a list of matrices is known as matrix concatenation. Matrices 
may be used for special operations, other than modifying position or shape in 
worid space, including projecting a view of a three dimensional model into a 
5 two dimensional plane, such as that of a video frame. 

A non-intuitive aspect of transformation matrices is that matrices for 
use in two-dimensions are defined as three-by-three matrices, and three 
dimensional transfomnations are accomplished using four-by-four 
transformation nDatrices. The co-ordinate system used in a four-by-four matrix 

1 0 system is not x,y^, but x/w, y/w, z/w and w. The variable w is not a physically 
measurable quantity, but provides a mathematical representation that makes 
the general technique of matrix concatenation possible. 

As objects may be defined recursively, in process 902, the object is 
analyzed into its lowest constituent objects. Then, working back up the 

1 5 recursive data structure, transformations at each level are concatenated onto 
the list of vertices which are defined as making up the object at the cun^nt 
level of recursion. In this way, for example, the propeller of a virtual rrxxlel 
aeroplane may rotate. This propeller is itself part of a larger object, the 
aeroplane, which flies from one side of the studio to the other. Thus a 

20 transformation of rotation are concatenated for the propeller object, and then 
transformations defining the path of flight are concatenated for the plane 
object. Consklering a single vertex on the propeller, this will have rotation and 
the various path of flight transfomiations concatenated to it, while other parts 
of the aeroplane will have only the path of flight ti^nsformations. This, 

25 therefore, is the highly structured approach to three-dimenstonal modelling 
which is adopted when defining objects for use in a virtual studk). 

In process 903, a viev»nng matiix is concatenated, in addition to 
whatever other transformations have already been concatenated. The 
viewing matiix is a special matrix, defined by tine location of the real camera, 

30 and is required in order to simplify projection of the three-dimensional worid 
space into a two din>ensional plane whkih will t>e performed in process 904. 



14 



The worid space in which objects are defined by the scene tree may 
be considered as a fixed volume, with any point in it defined by an x,y,z co- 
ordinate; but with the four co-ordinate system (x/w, yAv zAv and w) being 
preferred. The initial non-transformed state of any vertex has the value w 
5 equal to unity, so x/w, y/w and z/w are in fact equal to x,y and z before 
transfomnations have been applied. At some stage in the rendering process, 
it will be necessary to project an image onto a two-dimensional plane, which 
may be considered as the plane of the image focused in the virtual camera, 
and the image of the virtual world vi/hich would be displayed on a monitor. 

10 This tvo-dimensional pnojectbn has a variable angle with respect to 

the x, y and z axes of the virtual world space. An equation may be used to 
define this plane, in terms of the x,y,z co-ordinates of world space. Then it 
might be possible to project the three dimensional nrxxlel onto this space 
using basic geometrical equations. In three dimensions, this approach 

15 requires considerable calculation, and a simpler solution is to rotate and 
move all objects in worid space so that the projection plane is defined by the 
xy axes, and is perpendicular to the z axis. Thus, concatenation of the 
viewing matrix, perfomied in process 903, rotates and moves any object in 
worid space so that the system of co-ordinates is nomialized to the location 

20 of the projection plane. Another way of viewing this is that the virtual camera 
remains still while the virtual worid moves around it; corresponding to a fixed 
real worid that is viewed by a moving real camera. The relative nnovements 
are identical. 

In process 904, perspective projection of the currently selected object 
25 onto the projection plane is perfomned by concatenating a projection matrix. 
Note however, that the z co-ordinate is not discarded or set to zero, as this is 
required in order to perform hidden surface renxjval. 

In process 905 object culling is perfonned. Objects which lie outside 
the xy coordinate range of the projection plane are discarded, as are objects 
30 which are too close or too far from the virtual camera, for example, objects 
which are behind the virtual camera might otherwise be displayed as t>eing 



15 



inverted, when they should not be displayed at all. 

In process 907 the resulting vertices are added to the display list, 
along with a reference to the object primitives which they define, and other 
details, such as the type of surface, texture, specular reflectivity and so on. 
5 This information will later be used by the graphics rendering processor 509 
which has highly optimised circuits for translating this infomnation into frame 
pixel data in real time. 

In process 908, a question is asked as to whether any other objects 
remain to be added to the display list. If no other objects remain, the display 

10 list is supplied to the graphics pipeline of the rendering processor 509. 
Construction of the display list takes a variable amount of time, deperxling on 
the numlier and complexity of the objects and transfomiatkjns which it 
defines. Thus the display list may be produced well in advance of the next 
frame, or possibly take longer than one frame to calculate. The graphics 

15 pipeline is a concept which synchronizes display lists with video frame 
outputs. Thus, when a display list is early, it is stoned in the pipeline until it is 
needed. If the display list cannot be generated In time for the next frame, the 
prevKJUs display list is used, thereby minimising the visible effects. Cleariy, 
though, this is a situation wtiich is avoided if at all possible, as it reduces the 

20 realism of the resulting image. 

Due to the amount of parallel processing virtiich occurs in the system, 
a delay of a few frames is incurred. Thus the image of the combined virtual 
world and the real world is noticeably delayed in tinne by a fraction of a 
second with respect to the real time. This delay is related to the processing 

25 capacity of the computer hardware used to render the virtual images, and 
may be expected to decrease as more processing power becomes available. 

The sequence of steps shown in Figure 9 results in an image being 
drawn by the rendering processor 509. Ail objects seen by the virtual camera 
have sharp focus, whereas only those objects which are in the plane of focus 

30 in the real studio will have such a sharp focus. Thus, if the real camera 103 
zooms in on the statue 203 shown in Figure 2, the virtual backdrop 204 



16 



remains perfectly in focus. This results in a departure from the ideal of the 
virtual studio, where all objects (real or virtual) appear to exist within a single 
coherent studio, passing through a camera optics which do not differ for real 
or virtual images. 

5 An improved procedure is shown in Figure 10. In process 1001, a 

counter N is reset to the value zero. In process 1002. four pixel plane buffers 
are reset to zero. Each buffer contains a single mermry location for each 
pixel, each memory location being defined by a certain number of bits, 
depending on the accuracy required. Thus there are pixel plane buffers for 
10 red. green and blue colour pixels. In addition, a z buffer is used to facilitate 
hidden surface removal, by storing a z value for each pixel. As each object is 
rendered, red, green and blue pixels may only be written to if the z value for 
the new object is greater than the z value presently held for that pixel in the z 
buffer. 

15 In process 1003 a projection matrix is calculated, in v^rhich the position 

of the virtual camera is shifted slightly in the x and y planes. The matrix is 
calculated such that the degree of shift is proportional to the z value of the 
vertex which is being projected. The plane in focus is considered to have a z 
value of zero. Objects in front of or behind the plane in focus have 

20 increasingly larger, in negative or positive donnains, z values, resulting in 
Increasingly larger degrees of shift. The plane in focus is known from the 
measurement of the displacement of the focus ring on the lens 104 of the 
real camera 103, which is used in conjunction with calibration data for that 
particular lens to give a particular z value for the plane in focus, for each 

25 frame of video y^hich is generated by the camera. 

Thus, as the camera operator manually adjusts the focus ring on the 
lens, this information is supplied to the graphics processor, and used to 
calculate a viewir>g matrix, which is concatenated onto vertices of objects in 
step 903 in Figure 9, such that the position of the plane in focus is always 

30 normalized to a z value of zero before projection occurs in step 904. 

In step 1004, the scene is drawn in accordance with the processes 



17 



shown in Figure 9. This results in a particular image being stored in mennory, 
which represents the view generated by the varied projection matrix 
calculated in process 1003. In process 1005, this image is accumulated with 
previously generated images resulting from the varied projection matrix. 
5 Thus, each red pixel generated for the cument iteration of the scene is 
accumulated with previous iteratbns of the scene. In a large solid object, 
located outside the plane in focus, nnost of the area of the object which is 
rendered will be the result of several accumulated scene drawing iterations. 
However, at the edges of the object, due to the slight offset of each drawn 

10 scene in the x and y dimensions, there will be a number of less intense 
renderir>gs of the object, which provide the illusion of defocus. 

In process 1006 the counter, N, is incremented. In process 1007 a 
question is asked as to whether the value of N is now equal to a predefined 
quality value. If N is less than this amount, control is directed to process 

15 1002. and another version of the scene is drawn; this time from a slightly 
different viewpoint Thus, the numtier of times this loop is performed depends 
on the quality of de-focus which is required. Clearly it takes time to draw a 
scene several times, and different values of quality may be selected 
accordingly. 

20 In process 1008. the accumulated buffer contents are divkled by N, 

the number of times the loop has been executed. Thus, if a red pixel having 
the true colour value 15 is written to the accumulated pixel plane eight tinnes. 
the resulting accumulated red pixel value will be 120. By dividing this amount 
by N, the taie colour value retums to 15. If the red pixel was at the edge of a 

25 de-focused object, it is possit>le that several different values of red will be 
accumulated. Divkiing by N results in the average of these being used in the 
finai image, thus achieving the change in colour intensity required for the 
defocus effect. Once the buffer contents have been divided by N in process 
1008, control is directed to process 1009, where it is known that the defocus 

30 for the cument frame has been completed. 

The varied projection matrix used in process 1003 in Figure 10 is 



18 



detailed in Figure 1 1 . Also shown in this Figure are equations for calculating 
dx and dy, which are the small increments in x and y used to generate the 
defocus effect, dx and dy are respectively dependent on several other 
variables, defined elsewhere in the graphics system, including kx and ky. kx 
5 and ky are constants, determined experinnentally, which define the degree of 
shift produced at each iteration of the defocus loop. Other values are right, 
left, top and bottom, which are the minimum and maximum x and y values for 
the projection plane. The window resolutfons in x and y are used, so that kx 
and ky may be defined in terms of pixel units. 

10 The operation of a four by four matrix is shown in Figures 12A, 12B 

and 12C. In Figure 12A a four by four transfomnations matrix is shown. As 
stated earlier, four dimensions are used for reasons of mathematical 
expediency. The fourth dimension is 2, and the x.y.z physical dimensions are 
replaced by xAw, y/w and z/w. Typically, vertices start of with a w value of 

15 one. It is only during perspective viewing, or certain other unusual 
transformations, that a vertex includes a non-unity value of w. 

In Figure 12A, the vertical columns represents x'/w*, y'/w', z'/w* and w" 
outputs, while the horizontal rows represent the x/w, y/w, z/w and w inputs. At 
each intersection of a row and a column is a value or a function which may 

20 be evaluated. The combinations of these functions define how an input x/w. 
y/w, z/w. w vertex co-ordinates are translated into their respective x'/w'. y'/w'. 
z'/w' and v/ co-ordinates. The relatranships between input co-ordinates and 
output co-ordinates are defined according tot he equatk>ns shown in Figure 
12B. It may be seen that each output co-ordinate may be defined by any 

25 mathematical relationship of the four input co-ordinates. 

Typically, in rrrast matrixes, many of the matrix intersections will be set 
to zero, so that, for example x'/w' does not depend on y/w if b is set to zero. 
The power of the additional w co-ordinate may t>e appreciated when Figure 
12C is considered. Here, the x', y* and z' co-ordinates are recovered from 

30 x'/w', y'/w', z'/w'. w". The values x', y' and z' may ail be modified if the value of 
v/ has changed at some point in the matrix calculations. This fact enables far 



19 



more compiex equations to be represented in matrix form than if only the 
three physical dimensions are used. This type of co-ondinates are known as 
homogeneous co-ordinates. 

It is therefore possible to write out the operations represented by the 
5 matrix shown in Figure 11 in a direct algebraic form. Here it may be seen that 
the deviation in x and y is made proportional to z, so that no deviation occurs 
for objects in the plane In focus, for which z is zero. The other aspects of 
these equations relate to projection of a line from a vertex through a vertical 
two dimensional plane at the plane in focus, through to the front nodal point 

1 0 of camera lens. The front nodal point of the lens is the point through which 
rays theoretically converge. This point changes depending on the zoom 
position. Thus the front nodal point is calculated from the current position of 
the zoom lens in conjunction with calibration data for the lens from which the 
front nodal point may be derived. 

15 Figure 14 shows the effect of repeated shifting of the viewpoint on an 

object which is outside the plane in focus. The main part of the object has its 
true colour, while the edges differ in intensity according to the number of 
times the red, green and blue values for the object were accumulated into the 
respective pixel buffers. 

20 Figure 15 shows a top dovm plane of a virtual set. Two camera 

positions, 1501 and 1502 are separated by a distance dx calculated 
according to the equation shown in Figure 11. Objects 1504, 1505 and 1506 
in the plane in focus 1 503 do not shift, as their z co-ordinates are zero. A 
virtual object 1507 far from the plane in focus 1503 is shifted considerably. 

25 Figure 15 represents the situation after two iterations of the defocus loop 
shown in Figure 10. In practice a larger number of iterations will usually be 
considered necessary to achieve a suitable level of quality, a typical number 
being in the region of four to eight. 



20 



Claims 

1 . A method of processing image data, wherein real image data 
generated by a camera is combined with synthesized image data, comprising 

5 steps of 

generating camera positional data representing characteristics of said 

camera, induding an indication of zoom control; 

generating a synthesized image in response to said position data; and 
adjusting the perceived focus of said synthesized image in response 
10 to zoom control adjustments, so as to effect a focusing difference tietween a 

portion of said real innage and a portion of said synthesized image. 

2. A method according to Claim 1, wherein a portion of said 
synthesized image is de-focused to emphasise its location t)ehind said 

1 5 portion of the real image. 

3. A method according to Claim 1. wherein a portion of the 
synthesized image is de-focused to emphasise its locatbn in front of said 
portion of the real image. 

20 

4. A method according to Claim 2 and Claim 3, wherein de- 
focusing of a synthesized image includes de-focusing portions txjth behind 
and in front of said real image. 

25 5. A method according to any of Claims 1 to 4, wherein de- 

focusing is effected by varying projecting matrixes. 



30 



6. A method according to any of Claims 1 to 5. wherein de- 
focusing is perfomned a plurality of times where the value of said plurality is 
adjustable. 



21 



7. A method according to any of Claims 1 to 6, wherein calculated 
pixel values are accumulated on each iteration and said accumulated values 
are divided by the numt>er of iterations made. 

5 8. Apparatus for processing Image data, comprising a camera 

arranged to generate real image data; 

synthesizing means arranged to synthesize image data; 
positional data generating means arranged to generate camera 
positional data representing characteristics of said camera, including an 
1 0 indication of zoom control, wherein 

said synthesising means is arranged to generate a synthesized image 
in response of said positional data, and 

said synthesizing means as arranged to adjust the perceived focus of 
said synthesized image in response to zoom control adjustments, so as to 
15 effect a focusing difference between a portion of said real Image and a 
portion of said synthesized image. 

9. Apparatus according to Claim 8, wherein said synthesizing 
means is arranged to defocus a portion of said synthesized image to 

20 emphasise its location behind said portion of the real Image. 

10. Apparatus according to Claim 8, wherein said synthesizing 
means is arranged to defocus a portion of said synthesize Image to 
emphasize its location in front of said real image. 

25 

1 1 . Apparatus according to Claim 8, Including accumulating nneans 
for accumulating pixel values generated by a plurality of defocusing 
operations. 



30 12. A method or an apparatus substantially as herein descrit)ed 

with reference to the accompanying drawings. 



ApplkatioD No: GB 9705970.3 Examiner: Joe McCann 

Claims searched: All Date of search: 12 June 1997 



Patents Act 1977 

Search Report under Section 17 

Databases searched: 

UK Patent Office collections, including GB, EP, WO & US patent specifications, in: 
UK CI (Ed.O): H4F(FGJ,FGM);H4D(DLAB,DLFB.DLVX) 
IntCl (Ed.6): H04N(5/14,5/222,5/265,5/272,9/74);GO6T(15/10,17/00) 
Other: Online: WPl 



Documents considered to be relevant: 



Category 


Identity of document and relevant passage 


Relevant 


XE 


GB 2305050A 


(ORAD HI-TEC SYSTEMS LIMITED) - see page 
3 


1.8 


X 


GB 227 1241 A 


(BBC) - See page 4 


1,8 


A 


US 4970666 


(WELSH ET AL) - sec abstract 


1,8 


X 


SMPTE Journal vol 103, no 6, June 1994, Fukui, K;Hayashi, 
M;Yamanouchi,Y. "A Virtual studio system for TV program 
production", pages 386 to 390. 


1,8 



X Docuineni indicaiing lack of novehy or inventive step A 
Y DocuiDcK indicating lack of inveotrve aep if combined P 

&. Member of tt>e lamc paicni fimily 



Docunient indicating tectmological background and/or itaie of the art. 
Docuincnt published on or after the declared priority dale but before 
the filing dale of thii invemion. 

Patent documcni published on or alter, bul with priority dale earlier 
than, the filing date of this application. 



An Executive Agency of the Depanmcnt of Trade and Industry 



